time sery dataset
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County (0.04)
- Asia > Middle East > Israel (0.04)
DenoGrad: Deep Gradient Denoising Framework for Enhancing the Performance of Interpretable AI Models
Alonso-Ramos, J. Javier, Aguilera-Martos, Ignacio, Herrera-Poyatos, Andrés, Herrera, Francisco
The performance of Machine Learning (ML) models, particularly those operating within the Interpretable Artificial Intelligence (Interpretable AI) framework, is significantly affected by the presence of noise in both training and production data. Denoising has therefore become a critical preprocessing step, typically categorized into instance removal and instance correction techniques. However, existing correction approaches often degrade performance or oversimplify the problem by altering the original data distribution. This leads to unrealistic scenarios and biased models, which is particularly problematic in contexts where interpretable AI models are employed, as their interpretability depends on the fidelity of the underlying data patterns. In this paper, we argue that defining noise independently of the solution may be ineffective, as its nature can vary significantly across tasks and datasets. Using a task-specific high quality solution as a reference can provide a more precise and adaptable noise definition. To this end, we propose DenoGrad, a novel Gradient-based instance Denoiser framework that leverages gradients from an accurate Deep Learning (DL) model trained on the target data -- regardless of the specific task -- to detect and adjust noisy samples. Unlike conventional approaches, DenoGrad dynamically corrects noisy instances, preserving problem's data distribution, and improving AI models robustness. DenoGrad is validated on both tabular and time series datasets under various noise settings against the state-of-the-art. DenoGrad outperforms existing denoising strategies, enhancing the performance of interpretable IA models while standing out as the only high quality approach that preserves the original data distribution.
- North America > United States (0.28)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Workflow (1.00)
- Research Report > New Finding (0.46)
- Health & Medicine (0.92)
- Government > Regional Government (0.45)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County (0.04)
- Asia > Middle East > Israel (0.04)
TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis
Feng, Austin, Varvarigos, Andreas, Panitsas, Ioannis, Fernandez, Daniela, Wei, Jinbiao, Guo, Yuwei, Chen, Jialin, Maatouk, Ali, Tassiulas, Leandros, Ying, Rex
Modern enterprises generate vast streams of time series metrics when monitoring complex systems, known as observability data. Unlike conventional time series from domains such as weather, observability data are zero-inflated, highly stochastic, and exhibit minimal temporal structure. Despite their importance, observability datasets are underrepresented in public benchmarks due to proprietary restrictions. Existing datasets are often anonymized and normalized, removing scale information and limiting their use for tasks beyond forecasting, such as anomaly detection, root-cause analysis, and multi-modal reasoning. To address this gap, we introduce TelecomTS, a large-scale observability dataset derived from a 5G telecommunications network. TelecomTS features heterogeneous, de-anonymized covariates with explicit scale information and supports a suite of downstream tasks, including anomaly detection, root-cause analysis, and a question-answering benchmark requiring multi-modal reasoning. Benchmarking state-of-the-art time series, language, and reasoning models reveals that existing approaches struggle with the abrupt, noisy, and high-variance dynamics of observability data. Our experiments also underscore the importance of preserving covariates' absolute scale, emphasizing the need for foundation time series models that natively leverage scale information for practical observability applications.
- North America > United States (0.14)
- Asia > Singapore (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Telecommunications > Networks (1.00)
- Information Technology > Networks (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.67)
Learning Soft Sparse Shapes for Efficient Time-Series Classification
Liu, Zhen, Luo, Yicheng, Li, Boyuan, Eldele, Emadeldeen, Wu, Min, Ma, Qianli
Shapelets are discriminative subsequences (or shapes) with high interpretability in time series classification. Due to the time-intensive nature of shapelet discovery, existing shapelet-based methods mainly focus on selecting discriminative shapes while discarding others to achieve candidate subsequence sparsification. However, this approach may exclude beneficial shapes and overlook the varying contributions of shapelets to classification performance. To this end, we propose a Soft sparse Shapes (SoftShape) model for efficient time series classification. Our approach mainly introduces soft shape sparsification and soft shape learning blocks. The former transforms shapes into soft representations based on classification contribution scores, merging lower-scored ones into a single shape to retain and differentiate all subsequence information. The latter facilitates intra- and inter-shape temporal pattern learning, improving model efficiency by using sparsified soft shapes as inputs. Specifically, we employ a learnable router to activate a subset of class-specific expert networks for intra-shape pattern learning. Meanwhile, a shared expert network learns inter-shape patterns by converting sparsified shapes into sequences. Extensive experiments show that SoftShape outperforms state-of-the-art methods and produces interpretable results.
- North America > Canada > Newfoundland and Labrador > Labrador (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Asia > Singapore (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
An Efficient Model Maintenance Approach for MLOps
Majidi, Forough, Khomh, Foutse, Li, Heng, Nikanjam, Amin
In recent years, many industries have utilized machine learning models (ML) in their systems. Ideally, machine learning models should be trained on and applied to data from the same distributions. However, the data evolves over time in many application areas, leading to data and concept drift, which in turn causes the performance of the ML models to degrade over time. Therefore, maintaining up to date ML models plays a critical role in the MLOps pipeline. Existing ML model maintenance approaches are often computationally resource intensive, costly, time consuming, and model dependent. Thus, we propose an improved MLOps pipeline, a new model maintenance approach and a Similarity Based Model Reuse (SimReuse) tool to address the challenges of ML model maintenance. We identify seasonal and recurrent distribution patterns in time series datasets throughout a preliminary study. Recurrent distribution patterns enable us to reuse previously trained models for similar distributions in the future, thus avoiding frequent retraining. Then, we integrated the model reuse approach into the MLOps pipeline and proposed our improved MLOps pipeline. Furthermore, we develop SimReuse, a tool to implement the new components of our MLOps pipeline to store models and reuse them for inference of data segments with similar data distributions in the future. Our evaluation results on four time series datasets demonstrate that our model reuse approach can maintain the performance of models while significantly reducing maintenance time and costs. Our model reuse approach achieves ML performance comparable to the best baseline, while being 15 times more efficient in terms of computation time and costs. Therefore, industries and practitioners can benefit from our approach and use our tool to maintain the performance of their ML models in the deployment phase to reduce their maintenance costs.
- Europe > Switzerland > Zürich > Zürich (0.07)
- North America > Canada > Quebec (0.04)
- Oceania > Australia > New South Wales (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Information Technology > Services (0.46)
- Education > Educational Setting > Online (0.46)
Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis
Gkillas, Alexandros, Lalos, Aris
Anomaly and missing data constitute a thorny problem in industrial applications. In recent years, deep learning enabled anomaly detection has emerged as a critical direction, however the improved detection accuracy is achieved with the utilization of large neural networks, increasing their storage and computational cost. Moreover, the data collected in edge devices contain user privacy, introducing challenges that can be successfully addressed by the privacy-preserving distributed paradigm, known as federated learning (FL). This framework allows edge devices to train and exchange models increasing also the communication cost. Thus, to deal with the increased communication, processing and storage challenges of the FL based deep anomaly detection NN pruning is expected to have significant benefits towards reducing the processing, storage and communication complexity. With this focus, a novel compression-based optimization problem is proposed at the server-side of a FL paradigm that fusses the received local models broadcast and performs pruning generating a more compressed model. Experiments in the context of anomaly detection and missing value imputation demonstrate that the proposed FL scenario along with the proposed compressed-based method are able to achieve high compression rates (more than $99.7\%$) with negligible performance losses (less than $1.18\%$ ) as compared to the centralized solutions.
Hypergraph-based multi-scale spatio-temporal graph convolution network for Time-Series anomaly detection
Multivariate time series anomaly detection technology plays an important role in many fields including aerospace, water treatment, cloud service providers, etc. Excellent anomaly detection models can greatly improve work efficiency and avoid major economic losses. However, with the development of technology, the increasing size and complexity of data, and the lack of labels for relevant abnormal data, it is becoming increasingly challenging to perform effective and accurate anomaly detection in high-dimensional and complex data sets. In this paper, we propose a hypergraph based spatiotemporal graph convolutional neural network model STGCN_Hyper, which explicitly captures high-order, multi-hop correlations between multiple variables through a hypergraph based dynamic graph structure learning module. On this basis, we further use the hypergraph based spatiotemporal graph convolutional network to utilize the learned hypergraph structure to effectively propagate and aggregate one-hop and multi-hop related node information in the convolutional network, thereby obtaining rich spatial information. Furthermore, through the multi-scale TCN dilated convolution module, the STGCN_hyper model can also capture the dependencies of features at different scales in the temporal dimension. An unsupervised anomaly detector based on PCA and GMM is also integrated into the STGCN_hyper model. Through the anomaly score of the detector, the model can detect the anomalies in an unsupervised way. Experimental results on multiple time series datasets show that our model can flexibly learn the multi-scale time series features in the data and the dependencies between features, and outperforms most existing baseline models in terms of precision, recall, F1-score on anomaly detection tasks. Our code is available on: https://git.ecdf.ed.ac.uk/msc-23-24/s2044819
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom (0.04)
- Information Technology (1.00)
- Water & Waste Management > Water Management > Lifecycle > Treatment (0.66)
A Language Model-Guided Framework for Mining Time Series with Distributional Shifts
Zhu, Haibei, El-Laham, Yousef, Fons, Elizabeth, Vyetrenko, Svitlana
Effective utilization of time series data is often constrained by the scarcity of data quantity that reflects complex dynamics, especially under the condition of distributional shifts. Existing datasets may not encompass the full range of statistical properties required for robust and comprehensive analysis. And privacy concerns can further limit their accessibility in domains such as finance and healthcare. This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets. While obtained from external sources, the collected data share critical statistical properties with primary time series datasets, making it possible to model and adapt to various scenarios. This method enlarges the data quantity when the original data is limited or lacks essential properties. It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution. We demonstrate the effectiveness of the collected datasets through practical examples and show how time series forecasting foundation models fine-tuned on these datasets achieve comparable performance to those models without fine-tuning.
- North America > United States (0.15)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Banking & Finance > Economy (0.93)
- Health & Medicine (0.88)
- Information Technology > Security & Privacy (0.66)
Advancing multivariate time series similarity assessment: an integrated computational approach
Tonle, Franck, Tonnang, Henri, Ndadji, Milliam, Tchendji, Maurice, Nzeukou, Armand, Senagi, Kennedy, Niassy, Saliou
Data mining, particularly the analysis of multivariate time series data, plays a crucial role in extracting insights from complex systems and supporting informed decision-making across diverse domains. However, assessing the similarity of multivariate time series data presents several challenges, including dealing with large datasets, addressing temporal misalignments, and the need for efficient and comprehensive analytical frameworks. To address all these challenges, we propose a novel integrated computational approach known as Multivariate Time series Alignment and Similarity Assessment (MTASA). MTASA is built upon a hybrid methodology designed to optimize time series alignment, complemented by a multiprocessing engine that enhances the utilization of computational resources. This integrated approach comprises four key components, each addressing essential aspects of time series similarity assessment, thereby offering a comprehensive framework for analysis. MTASA is implemented as an open-source Python library with a user-friendly interface, making it accessible to researchers and practitioners. To evaluate the effectiveness of MTASA, we conducted an empirical study focused on assessing agroecosystem similarity using real-world environmental data. The results from this study highlight MTASA's superiority, achieving approximately 1.5 times greater accuracy and twice the speed compared to existing state-of-the-art integrated frameworks for multivariate time series similarity assessment. It is hoped that MTASA will significantly enhance the efficiency and accessibility of multivariate time series analysis, benefitting researchers and practitioners across various domains. Its capabilities in handling large datasets, addressing temporal misalignments, and delivering accurate results make MTASA a valuable tool for deriving insights and aiding decision-making processes in complex systems.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Germany (0.04)
- Asia > Vietnam (0.04)
- (6 more...)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
- Government (0.68)